Streaming Pointwise Mutual Information

نویسندگان

  • Benjamin Van Durme
  • Ashwin Lall
چکیده

Recent work has led to the ability to perform space efficient, approximate counting over large vocabularies in a streaming context. Motivated by the existence of data structures of this type, we explore the computation of associativity scores, otherwise known as pointwise mutual information (PMI), in a streaming context. We give theoretical bounds showing the impracticality of perfect online PMI computation, and detail an algorithm with high expected accuracy. Experiments on news articles show our approach gives high accuracy on real world data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Heavily-Weighted Features with the Weight-Median Sketch

We introduce the Weight-Median Sketch, a sub-linear space data structure that captures the most heavily weighted features in linear classifiers trained over data streams. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual infor...

متن کامل

Probability Mass Exclusions and the Directed Components of Pointwise Mutual Information

The pointwise mutual information quantifies the mutual information between events x and y from random variable X and Y . This article considers the pointwise mutual information in a directed sense, examining precisely how an event y provides information about x via probability mass exclusions. Two distinct types of exclusions are identified—namely informative and misinformative exclusions. Then...

متن کامل

Normalized (Pointwise) Mutual Information in Collocation Extraction

In this paper, we discuss the related information theoretical association measures of mutual information and pointwise mutual information, in the context of collocation extraction. We introduce normalized variants of these measures in order to make them more easily interpretable and at the same time less sensitive to occurrence frequency. We also provide a small empirical study to give more ins...

متن کامل

Two Multivariate Generalizations of Pointwise Mutual Information

Since its introduction into the NLP community, pointwise mutual information has proven to be a useful association measure in numerous natural language processing applications such as collocation extraction and word space models. In its original form, it is restricted to the analysis of two-way co-occurrences. NLP problems, however, need not be restricted to twoway co-occurrences; often, a parti...

متن کامل

Weakly Supervised Object Detection with Pointwise Mutual Information

In this work a novel approach for weakly supervised object detection that incorporates pointwise mutual information is presented. A fully convolutional neural network architecture is applied in which the network learns one filter per object class. The resulting feature map indicates the location of objects in an image, yielding an intuitive representation of a class activation map. While tradit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009